01 Data Visualization
This chapter accompanies 1 Data Visualization in R4DS.
1 Chapter Follow-Along
Use the following code cell to follow along while reading the chapter. (You are encouraged to do this in RStudio instead!)
These supplementary notes use something called webr, which allows R to run in your browser (no installation needed)!
2 Exercises
1.2 First Steps
- How many rows are in
penguins? How many columns?
- What does the
bill_depth_mmvariable in thepenguinsdata frame describe? Read the help for?penguinsto find out.
- Make a scatterplot of
bill_depth_mmvs.bill_length_mm. That is, make a scatterplot withbill_depth_mmon the y-axis andbill_length_mmon the x-axis. Describe the relationship between these two variables.
- What happens if you make a scatterplot of
speciesvs.bill_depth_mm? What might be a better choice of geom?
- Why does the following give an error and how would you fix it?
- What does the
na.rmargument do ingeom_point()? What is the default value of the argument? Create a scatterplot where you successfully use this argument set toTRUE.
- Add the following caption to the plot you made in the previous exercise:
“Data come from the palmerpenguins package.”Hint: Take a look at the documentation for labs().
- Recreate the following visualization. What aesthetic should
bill_depth_mmbe mapped to? And should it be mapped at the global level or at the geom level?
- Run this code in your head and predict what the output will look like. Then, run the code in R and check your predictions. (Note: ggplot2 accepts either “
color” or “colour” as an argument. I am changing all instances to the Canadian spelling.)
- Will these two graphs look different? Why/why not?
1.4 Visualizing distributions
- Make a bar plot of
speciesof penguins, where you instead assignspeciesto theyaesthetic. How is this plot different?
- How are the following two plots different? Which aesthetic,
colourorfill, is more useful for changing the colour of bars?
- Make a histogram of the
caratvariable in thediamondsdataset that is available when you load theggplot2package. Experiment with differentbinwidths. What binwidth reveals the most interesting patterns?
1.5.5 Visualizing Relationships
- The
mpgdata frame that is bundled with theggplot2package contains 234 observations collected by the US Environmental Protection Agency on 38 car models. Which variables inmpgare categorical? Which variables are numerical? (Hint: Type?mpgto read the documentation for the dataset.) How can you see this information when you run mpg?
- Make a scatterplot of
hwyvs.displusing thempgdata frame. Next, map a third, numerical variable tocolour, thensize, then bothcolourandsize, thenshape. How do these aesthetics behave differently for categorical vs. numerical variables? - In the scatterplot of hwy vs. displ, what happens if you map a third variable to linewidth?
- What happens if you map the same variable to multiple aesthetics?
- Make a scatterplot of
bill_depth_mmvs.bill_length_mmand colour the points byspecies. What does adding colouring byspeciesreveal about the relationship between these two variables? What about faceting byspecies?
- Why does the following yield two separate legends? How would you fix it to combine the two legends?
- Create the two following stacked bar plots. Which question can you answer with the first one? Which question can you answer with the second one?